install.packages("tibble")Today we will…
Hi, I’m Dr. Rehnberg!
I am a transplant to the west coast – PA to MO to MI to CA.
My favorite things are being outside, drinking tea, and watching reality tv.
I am teaching this course for the first time – please bear with me as I get materials ready for Canvas.
I have a genetic, degenerative eye disease called Stargardt disease, which causes me to have poor vision, even with corrective lenses.
What this means for you:
When I am helping you on your computer, please make the font large and turn the brightness up.
I have difficulty recognizing faces – please be patient!
Questions?
We will be joined in class by Sophia.
Sophia is…
A second-year Statistics major pursuing a Data Science minor.
Originally from San Ramon in the East Bay Area.
A golfer, dancer, and crocheter!
I am looking forward to reading your introductions on Canvas Discussions!
R’s strengths are…
… handling data with lots of different types of variables.
… making nice and complex data visualizations.
… having cutting-edge statistical methods available to users.
R’s weaknesses are…
… performing non-analysis programming tasks, like website creation (python, ruby, …).
… hyper-efficient numerical computation (matlab, C, …).
… being a simple tool for all audiences (SPSS, STATA, JMP, minitab, …).
The heart and soul of R are packages.
To install a package use:
Importantly, R is open-source.
This means packages are created by users like you and me!
Being a good open-source citizen means…
… sharing your code publicly when possible (later in this course, we’ll learn about GitHub!).
… contributing to public projects and packages, as you are able.
… creating your own packages, if you can.
… using R for ethical and respectful projects.
RStudio is an IDE (Integrated Developer Environment).
RStudio was released in 2011 by J.J. Allaire.
In 2014, RStudio hired Hadley Wickham as Chief Data Scientist. They now employ around 20 full-time developers.
Recall: You can not sell R code, so packages created by RStudio’s team are freely available.
A directory is just a fancy name for a folder.
Your working directory is the folder that R “thinks” it lives in at the moment.
[1] "/Users/zrehnber/Documents/Teaching/Stat_331/S23/lecture_slides/W1_intro_R"
This file lives in my user files Users/…
…on my account zrehnber/ …
…in my Documents folder …
…in a series of organized folders.
Create a directory for this class!
Is it in a place you can easily find it?
Does it have an informative name?
Are the files inside it well-organized?
An R Project is basically a “flag” planted in a certain directory.
When you double click an .Rproj file, it:
Opens RStudio
Sets the working directory to be wherever the .Rproj file lives.
Links to GitHub, if set up (more on that later!)
RStudio Projects are great for reproducibility!
You can send anyone your folder with your .Rproj file and they will be able to run your code on their computer.
We will be using RStudio Projects throughout this course.
You can to send your project to someone else, and they can jump in and start working right away.
This means:
Files are organized and well-named.
References to data and code work for everyone.
Package dependency is clear.
Code will run the same every time, even if data values change.
Analysis process is well-explained and easy to read.
/User/zrehnber/Stat331/lab1/ rather than Desktop/stuff/If you put something like this at the top of your .qmd file (more on Quarto later), I will set your computer on fire:
Setting working directory by hand = BAD!
That directory is specific to you!
R Markdown and Quarto (more on these later) ignore this code when knitting!
A value is a basic unit of stuff that a program works with.
Values have types:
Variables are names that refer to values.
A variable is like a container that holds something - when you refer to the container, you get whatever is stored inside.
We assign values to variables using the syntax object_name <- value.
Homogeneous: every element has the same data type.
Vector: a one-dimensional column of homogeneous data.
Matrix: the next step after a vector - it’s a set of homogenous data arranged in a two-dimensional, rectangular format.
Heterogeneous: the elements can be of different types.
List: a one-dimensional column of heterogeneous data.
Dataframe: a two-dimensional set of heterogeneous data arranged in a rectangular format.
We use square brackets ([]) to access elements within data structures.
We can combine logical statements using and, or, and not.
(X AND Y) requires that both X and Y are true.
(X OR Y) requires that one of X or Y is true.
(NOT X) is true if X is false, and false if X is true.
seq(from = 1, to = 10, by = 1
Error: <text>:2:0: unexpected end of input
1: seq(from = 1, to = 10, by = 1
^
seq(from = 1, to = 10 by = 1)
sequence(from = 1, to = 10, by = 1)
Error in sequence.default(from = 1, to = 10, by = 1): argument "nvec" is missing, with no default
sqrt(‘1’)
Error in my_obj(5): could not find function "my_obj"
Just because you see scary red text, this does not mean something went wrong! This is just R communicating with you.
Often, R will give you a warning.
This means that your code did run…
…but you probably want to make sure it succeeded.
Does this look right?
If the word Error appears in your message from R, then you have a problem.
Error: Object
some_objnot found.
Error: Object of type ‘closure’ is not subsettable.
Error: Non-numeric argument to binary operator.
Look at the help file for the function!
When all else fails, Google your error message.
Leave out the specifics.
Include the function you are using.
What’s wrong here?
The components of the Practice Activity are described below:
Part One:
This file has many mistakes in the code. Some are errors that will prevent the file from knitting; some are mistakes that do NOT result in an error.
Fix all the problems in the code chunks.
Part Two:
Follow the instructions in the file to uncover a secret message.
Submit the name of the poem as the answer to the Canvas Quiz question.
Today we will…
File > New File > R Script) are files of code that are meant to be run on their own.Scripts can be run in RStudio by clicking the Run button at the top of the editor window when the script is open.
You can also run code interactively in a script by:
highlighting lines of code and hitting run.
placing your cursor on a line of code and hitting run.
placing your cursor on a line of code and hitting ctrl + enter or command + enter.
Notebooks are an implementation of literate programming.
They allow you to integrate code, output, text, images, etc. into a single document.
E.g.,
Reproducibility!
Markdown (without the “R”) is a markup language.
It uses special symbols and formatting to make pretty documents.
Markdown files have the .md extension.
R Markdown (with the “R”) uses regular Markdown, AND it can run and display R code.
Quarto unifies and extends the R Markdown ecosystem.
Quarto is the next generation R Markdown.
Consistent implementation of attractive and handy features across outputs:
More accessible defaults and better support for accessibility.
Guardrails that are helpful when learning:
Support for other languages like Python, Julia, Observable, and more.
Quarto makes moving between outputs straightforward.
A few useful tips for formatting the Markdown text in your document:
R code chunk options are included at the top of each code chunk, prefaced with a #| (hashpipe).
To take your .qmd file and make it look pretty, you have to render it.
Quarto CLI (command line interface) orchestrates each step of rendering:
knitr or jupyter.When you click Render: